统计计量丨倍差法DID详解 (一):传统 DID
第一,本系列将上述的 DID 模型的设置,平行趋势的检验和图形表示以及边际效果等一般化的流程尽可能地在一篇推文中展示出来,为读者作为参考;
第二,本系列利用同一套的模拟数据,将 Standard DID 以及 Time-varying DID 结合起来,分析其方程设定的异同;
第三,在实际的操作过程中,Time-varying DID 利用 ESA 方法对平行趋势进行检验时,由于可能存在不是所有个体都最终接受了政策干预,因此,这两种情况下,ESA 方程的设定和代码写作存在差异,这也是本系列推文所着重关心的问题。
clear all
set obs 60
set seed 10101
gen id =_n
/// 每一个数值的数量扩大11倍,再减去前六十个观测值,即60*11-60 = 600,为60个体10年的面板数据
expand 11
drop in 1/60
bys id: gen time = _n+1999
xtset id time
///生成协变量x1, x2
gen x1 = rnormal(1,7)
gen x2 = rnormal(2,5)
sort time id
by time: gen ind = _n
sort id time
by id: gen T = _n
gen D = 0
replace D = 1 if id > 29
gen post = 0
replace post = 1 if time >= 2005
///将基础数据结构保存成dta文件,命名为DID_Basic_Simu.dta,默认保存在当前的 working directory 路径下
save "DID_Basic_Simu.dta",replace
政策效果不随时间而变的Standard DID的模拟
use "DID_Basic_Simu.dta"
bysort id: gen y0 = 10 + 5 * x1 + 3 * x2 + T + ind + rnormal()
bysort id: gen y1 = 10 + 5 * x1 + 3 * x2 + T + ind + rnormal() if time < 2005
bysort id: replace y1 = 10 + 5 * x1 + 3 * x2 + 10 + T + ind + rnormal() if time >= 2005
gen y = y0 + D * (y1 - y0)
xtreg y x1 x2 , fe r
predict e,ue
binscatter e time, line(connect) by(D)
graph export "article1_1.png",as(png) replace width(800) height(600)
, areg
, reghdfe
等多个 Stata 命令,四个命令的比较放在了下方的表格中。在本文中,主要展示 reghdfe
命令的输出结果。reghdfe y c.D#c.post x1 x2, absorb(id time) vce(robust)
(converged in 3 iterations)
HDFE Linear regression Number of obs = 600
Absorbing 2 HDFE groups F( 3, 528) = 262423.36
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.9995
Adj R-squared = 0.9995
Within R-sq. = 0.9993
Root MSE = 1.0154
| Robust
y | Coef. Std. Err. t P>t [95% Conf. Interval]
c.D#c.post | 9.868932 .1654078 59.66 0.000 9.543994 10.19387
x1 | 4.998123 .0060976 819.69 0.000 4.986144 5.010101
x2 | 3.003366 .0084909 353.71 0.000 2.986685 3.020046
Absorbed degrees of freedom:
Absorbed FE | Num. Coefs. = Categories Redundant
id | 60 60 0
time | 9 10 1
reg y c.D#c.post x1 x2 i.time i.id, r
eststo reg
xtreg y c.D#c.post x1 x2 i.time, r fe
eststo xtreg_fe
areg y c.D#c.post x1 x2 i.time, absorb(id) robust
eststo areg
reghdfe y c.D#c.post x1 x2, absorb(id time) vce(robust)
eststo reghdfe
estout , title("The Comparison of Actual Parameter Values") ///
cells(b(star fmt(%9.3f)) se(par)) ///
stats(N N_g, fmt(%9.0f %9.0g) label(N Groups)) ///
legend collabels(none) varlabels(_cons Constant) keep(x1 x2 c.D#c.post)
| The Comparison of Actual Parameter Values
| reg xtreg_fe areg reghdfe
c.D#c.post | 9.869*** 9.869*** 9.869*** 9.869***
| (0.166) (0.189) (0.166) (0.166)
x1 | 4.998*** 4.998*** 4.998*** 4.998***
| (0.006) (0.006) (0.006) (0.006)
x2 | 3.003*** 3.003*** 3.003*** 3.003***
| (0.008) (0.008) (0.008) (0.008)
N | 600 600 600 600
Groups | 60
*p<0.05, ** p<0.01, *** p<0.001
政策效果随时间而变的 Standard DID 的模拟
use "DID_Basic_Simu.dta"
bysort id: gen y0 = 10 + 5 * x1 + 3 * x2 + T + ind + rnormal()
bysort id: gen y1 = 10 + 5 * x1 + 3 * x2 + T + ind + rnormal() if time < 2005
bysort id: replace y1 = 10 + 5 * x1 + 3 * x2 + T*5 + ind + rnormal() if time >= 2005
gen y = y0 + D * (y1 - y0)
xtreg y x1 x2,fe r
predict e,ue
binscatter e time,line(connect)by(D)
, areg
, reghdfe
等多个 Stata
命令。在本文中,主要展示命令'reghdfe'的输出结果:. reghdfe y c.D#c.post x1 x2, absorb(id time) vce(robust)
(converged in 3 iterations)
HDFE Linear regression Number of obs = 600
Absorbing 2 HDFE groups F( 3, 528) = 44552.05
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.9978
Adj R-squared = 0.9975
Within R-sq. = 0.9964
Root MSE = 2.4029
| Robust
y | Coef. Std. Err. t P>t [95% Conf. Interval]
c.D#c.post | 31.84782 .3931208 81.01 0.000 31.07555 32.6201
x1 | 5.014706 .0154247 325.11 0.000 4.984405 5.045008
x2 | 2.973101 .0202111 147.10 0.000 2.933397 3.012805
Absorbed degrees of freedom:
Absorbed FE| Num. Coefs. = Categories Redundant
id | 60 60 0
time | 9 10 1
reg y c.D#c.post x1 x2 i.time i.id, r
eststo reg
xtreg y c.D#c.post x1 x2 i.time, r fe
eststo xtreg_fe
areg y c.D#c.post x1 x2 i.time, absorb(id) robust
eststo areg
reghdfe y c.D#c.post x1 x2, absorb(id time) vce(robust)
eststo reghdfe
estout , title("The Comparison of Actual Parameter Values") ///
cells(b(star fmt(%9.3f)) se(par)) ///
stats(N N_g, fmt(%9.0f %9.0g) label(N Groups)) ///
legend collabels(none) varlabels(_cons Constant) keep(x1 x2 c.D#c.post)
| The Comparison of Actual Parameter Values
| reg xtreg_fe areg reghdfe
c.D#c.post | 31.848*** 31.848*** 31.848*** 31.848***
| (0.394) (0.194) (0.394) (0.394)
x1 | 5.015*** 5.015*** 5.015*** 5.015***
| (0.015) (0.016) (0.015) (0.015)
x2 | 2.973*** 2.973*** 2.973*** 2.973***
| (0.020) (0.019) (0.020) (0.020)
N | 600 600 600 600
Groups | 60
*p<0.05, ** p<0.01, *** p<0.001
Standard DID 和 Event Study Approach 的结合
5.1 灵活的 DID :政策效果不随时间发生变化
的Factor Variables
use "DID_Basic_Simu.dta"
bysort id: gen y0 = 10 + 5 * x1 + 3 * x2 + T + ind + rnormal()
bysort id: gen y1 = 10 + 5 * x1 + 3 * x2 + T + ind + rnormal() if time < 2005
bysort id: replace y1 = 10 + 5 * x1 + 3 * x2 + 10 + T + ind + rnormal() if time >= 2005
gen y = y0 + D * (y1 - y0)
tab time, gen(year)
. reghdfe y i.D#i.time x1 x2, vce(robust) absorb(id time)
(converged in 3 iterations)
HDFE Linear regression Number of obs = 600
Absorbing 2 HDFE groups F( 11, 520) = 71307.67
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.9995
Adj R-squared = 0.9995
Within R-sq. = 0.9993
Root MSE = 1.0147
| Robust
y | Coef. Std. Err. t P>t [95% Conf. Interval]
D#time |
1 2001 | .4381916 .3796627 1.15 0.249 -.3076697 1.184053
1 2002 | .6093975 .3984095 1.53 0.127 -.1732924 1.392087
1 2003 | .4808495 .3948783 1.22 0.224 -.2949033 1.256602
1 2004 | .1168801 .4088713 0.29 0.775 -.6863626 .9201227
1 2005 | 9.810181 .3870237 25.35 0.000 9.049859 10.5705
1 2006 | 10.48194 .3664986 28.60 0.000 9.761937 11.20194
1 2007 | 9.999201 .3978656 25.13 0.000 9.217579 10.78082
1 2008 | 10.2474 .4087051 25.07 0.000 9.444481 11.05031
1 2009 | 10.45248 .3979999 26.26 0.000 9.670592 11.23436
x1 | 4.996797 .0061877 807.54 0.000 4.984641 5.008953
x2 | 3.004127 .0087679 342.63 0.000 2.986902 3.021352
Absorbed degrees of freedom:
Absorbed FE| Num. Coefs. = Categories - Redundant
id | 60 60 0
time | 9 10 1
. reghdfe y c.D#(c.year2-year10) x1 x2, absorb(id time) vce(robust)
(converged in 3 iterations)
HDFE Linear regression Number of obs = 600
Absorbing 2 HDFE groups F( 11, 520) = 71307.67
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.9995
Adj R-squared = 0.9995
Within R-sq. = 0.9993
Root MSE = 1.0147
| Robust
y | Coef. Std. Err. t P>t [95% Conf. Interval]
c.D#c.year2 | .4381916 .3796627 1.15 0.249 -.3076697 1.184053
c.D#c.year3 | .6093975 .3984095 1.53 0.127 -.1732924 1.392087
c.D#c.year4 | .4808495 .3948783 1.22 0.224 -.2949033 1.256602
c.D#c.year5 | .1168801 .4088713 0.29 0.775 -.6863626 .9201227
c.D#c.year6 | 9.810181 .3870237 25.35 0.000 9.049859 10.5705
c.D#c.year7 | 10.48194 .3664986 28.60 0.000 9.761937 11.20194
c.D#c.year8 | 9.999201 .3978656 25.13 0.000 9.217579 10.78082
c.D#c.year9 | 10.2474 .4087051 25.07 0.000 9.444481 11.05031
c.D#c.year10| 10.45248 .3979999 26.26 0.000 9.670592 11.23436
x1| 4.996797 .0061877 807.54 0.000 4.984641 5.008953
x2| 3.004127 .0087679 342.63 0.000 2.986902 3.021352
Absorbed degrees of freedom:
Absorbed FE| Num. Coefs. = Categories - Redundant
id | 60 60 0
time | 9 10 1
,其95%CI都包含真实效应10,因此,该方程设定是可靠的。系数的图形化表达如下:coefplot, ///
keep(c.D#c.year2 c.D#c.year3 c.D#c.year4 c.D#c.year5 c.D#c.year6 c.D#c.year7 c.D#c.year8 c.D#c.year9 c.D#c.year10) ///
coeflabels(c.D#c.year2 = "-4" ///
c.D#c.year3 = "-3" ///
c.D#c.year4 = "-2" ///
c.D#c.year5 = "-1" ///
c.D#c.year6 = "0" ///
c.D#c.year7 = "1" ///
c.D#c.year8 = "2" ///
c.D#c.year9 = "3" ///
c.D#c.year10 = "4") ///
vertical ///
yline(0) ///
ytitle("Coef") ///
xtitle("Time passage relative to year of adoption of implied contract exception") ///
addplot(line @b @at) ///
ciopts(recast(rcap)) ///
5.2 灵活的 DID :政策效果随时间发生变化
use "DID_Basic_Simu.dta"
bysort id: gen y0 = 10 + 5 * x1 + 3 * x2 + T + ind + rnormal()
bysort id: gen y1 = 10 + 5 * x1 + 3 * x2 + T + ind + rnormal() if time < 2005
bysort id: replace y1 = 10 + 5 * x1 + 3 * x2 + T * 5 + ind + rnormal() if time >= 2005
gen y = y0 + D * (y1 - y0)
tab time, gen(year)
reghdfe y i.D#i.time x1 x2, vce(robust) absorb(id time)
(converged in 3 iterations)
HDFE Linear regression Number of obs = 600
Absorbing 2 HDFE groups F( 11, 520) = 75233.93
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.9996
Adj R-squared = 0.9996
Within R-sq. = 0.9994
Root MSE = 1.0147
| Robust
y| Coef. Std. Err. t P>t [95% Conf. Interval]
D#time |
1 2001 | .4381916 .3796627 1.15 0.249 -.3076697 1.184053
1 2002 | .6093975 .3984095 1.53 0.127 -.1732924 1.392087
1 2003 | .4808495 .3948783 1.22 0.224 -.2949033 1.256602
1 2004 | .1168801 .4088713 0.29 0.775 -.6863626 .9201227
1 2005 | 23.81018 .3870237 61.52 0.000 23.04986 24.5705
1 2006 | 28.48194 .3664986 77.71 0.000 27.76194 29.20194
1 2007 | 31.9992 .3978656 80.43 0.000 31.21758 32.78082
1 2008 | 36.2474 .4087051 88.69 0.000 35.44448 37.05031
1 2009 | 40.45248 .3979999 101.64 0.000 39.67059 41.23436
x1 | 4.996797 .0061877 807.54 0.000 4.984641 5.008953
x2 | 3.004127 .0087679 342.63 0.000 2.986902 3.021352
Absorbed degrees of freedom:`
Absorbed FE| Num. Coefs. = Categories Redundant
id | 60 60 0
time | 9 10 1
reghdfe y c.D#(c.year2-year10) x1 x2 , absorb(id time) vce(robust)
(converged in 3 iterations)
HDFE Linear regression Number of obs = 600
Absorbing 2 HDFE groups F( 11, 520) = 75233.93
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.9996
Adj R-squared = 0.9996
Within R-sq. = 0.9994
Root MSE = 1.0147
| Robust
y | Coef. Std. Err. t P>t [95% Conf. Interval]
c.D#c.year2 | .4381916 .3796627 1.15 0.249 -.3076697 1.184053
c.D#c.year3 | .6093975 .3984095 1.53 0.127 -.1732924 1.392087
c.D#c.year4 | .4808495 .3948783 1.22 0.224 -.2949033 1.256602
c.D#c.year5 | .1168801 .4088713 0.29 0.775 -.6863626 .9201227
c.D#c.year6 | 23.81018 .3870237 61.52 0.000 23.04986 24.5705
c.D#c.year7 | 28.48194 .3664986 77.71 0.000 27.76194 29.20194
c.D#c.year8 | 31.9992 .3978656 80.43 0.000 31.21758 32.78082
c.D#c.year9 | 36.2474 .4087051 88.69 0.000 35.44448 37.05031
c.D#c.year10| 40.45248 .3979999 101.64 0.000 39.67059 41.23436
x1| 4.996797 .0061877 807.54 0.000 4.984641 5.008953
x2| 3.004127 .0087679 342.63 0.000 2.986902 3.021352
Absorbed degrees of freedom:
Absorbed FE| Num. Coefs. = Categories Redundant
id | 60 60 0
time | 9 10 1
keep(c.D#c.year2 c.D#c.year3 c.D#c.year4 c.D#c.year5 c.D#c.year6 c.D#c.year7 c.D#c.year8 c.D#c.year9 c.D#c.year10)
///coeflabels(c.D#c.year2 = "-4" ///
c.D#c.year3 = "-3" ///
c.D#c.year4 = "-2" ///
c.D#c.year5 = "-1" ///
c.D#c.year6 = "0" ///
c.D#c.year7 = "1" ///
c.D#c.year8 = "2" ///
c.D#c.year9 = "3" ///
c.D#c.year10 = "4") ///
vertical ///
yline(0) ///
ytitle("Coef") ///
xtitle("Time passage relative to year of adoption of implied contract exception") ///
addplot(line @b @at) ///
ciopts(recast(rcap)) ///
